Here are the packages we’ll be using:
library(statnet)
library(igraph)
library(intergraph)
{igraph} and {statnet} are two of the most commonly used packages for social network analyses. Both include built-in functions that can plot networks and answer almost every beginner-level question about a given network. However, there may be some functions that are present in one package but not the other (or just work better). So, {intergraph} is another useful package that helps users transition between {igraph} and {statnet}.
When analyzing social network data, we want to know to what degree our variables impact the formation of a particular network organization. Our independent variables or predictor variables are node level variables (e.g. age) or similarity between node level variables (e.g. whether pairs belong to the same age category). Our dependent variable is the organization of edges in the network.
We cannot use standard OLS regressions to analyze pair characteristics and the presence of edges between pairs because it violates the assumption that observations are independent. For example, observations A-B, A-C, and A-D are not independent because they all involve individual A. Or if one bonobo is GG rubbing with 5 other bonobos, those 5 other bonobos may also be GG rubbing with each other. Quadratic assignment procedure (QAP) addresses this issue of non-independence.
QAP is a resampling-based method that controls for non-independence
in network structure via random permutations. These permutations use
different arrangements of the rows and columns in the adjacency matrix.
Thus, the network structure is maintained but the arrangement of
individuals in the structure is randomized. This represents the null
hypothesis because it should eliminate any potential correlations
between ties and independent values.
There are 3 types of QAP regressions we can run with {statnet}: correlation regressions, linear regressions, and multiple regressions. We will describe and demonstrate each regression using the bonobo data.
Correlation QAPs simply test whether two social networks are correlated. For example, using our bonobo datasets, we can ask: Do bonobos who groom each other correlate with bonobos who GG rub with one another?
We can start by using the gcor() function to get a correlation coefficient between the two networks.
gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054
These networks don’t seem very correlated but let’s test for significance anyways.
qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
gcor, # the function you're using is correlation between networks (gcor)
g1=1, # use first graph in the list
g2=2, # use second graph in the list
reps = 1000) # number of permutations to run
summary(qap_cor)
##
## QAP Test Results
##
## Estimated p-values:
## p(f(perm) >= f(d)): 0.301
## p(f(perm) <= f(d)): 0.842
##
## Test Diagnostics:
## Test Value (f(d)): 0.1111054
## Replications: 1000
## Distribution Summary:
## Min: -0.4191705
## 1stQ: -0.1010049
## Med: 0.005050247
## Mean: -0.004812886
## 3rdQ: 0.1111054
## Max: 0.8534918
[DESCRIBE/INTERPRET OUTPUT]
Should we plot these and explain them? Is that helpful at all? -sam
Multiple regression QAPs test the extent to which predictor variables
are affecting edge weights.
For example, if our predictor variable
is rank, we can ask: Are high-ranking or low-ranking individuals
more tied through GG rubbing? In other words, we are asking:
how many times are you likely to have more edges for each unit increase
in rank?
Predictor variables must be matrices in order to be used in the QAP functions. We can look at whether rank affects the weight of ties sent or received (or both, but they must be added to the model separately).
First, we must create a matrix that shows the rank of senders. Note that senders are represented by ROWS in the matrices.
nodes <- 7 # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in
rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the age of each actor is repeated over entire ROW of matrix
rank_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
Next, we must create a matrix that shows the rank of receivers. Note that receivers are represented by COLUMNS in the matrices.
rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the age of each actor is repeated over entire COLUMN of matrix
rank_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our multiple regression! First we are asking: Does rank affect how many times bonobos G-G rubbed others (the weight of the edges they sent)?
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending)) # list of predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4534205 -0.3093578 -0.2036344 0.5465795 0.8695587
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.8507548 0.996 0.004 0.005
## x1 -0.1161796 0.014 0.986 0.034
##
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227 Adjusted R-squared: 0.03883
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.52845 -1.88290
## 1stQ -0.56955 -0.55450
## Median 0.14564 0.01265
## Mean 0.11505 0.02629
## 3rdQ 0.83359 0.52868
## Max 2.61709 1.87439
[DESCRIBE/INTERPRET OUTPUT]
Does rank affect how many times bonobos received G-G rubbing from others (the weight of the edges they receive)?
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.3992884 -0.3393159 -0.2434687 0.6007116 0.8021901
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.64715036 0.992 0.008 0.011
## x1 -0.07247427 0.049 0.951 0.093
##
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423 Adjusted R-squared: -0.0001605
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.116679 -1.233718
## 1stQ -0.505828 -0.507257
## Median 0.064646 -0.007029
## Mean 0.073798 -0.020126
## 3rdQ 0.664057 0.479828
## Max 2.053671 1.347807
[DESCRIBE/INTERPRET OUTPUT]
We can include multiple predictor variables! We can also use another network as a predictor variable! For example, we can ask: Do sender rank and grooming relationships predict the number of times bonobos G-G rubbed in the network?
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving, bonobo_groom_net)) # list of all predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4834824 -0.3406152 -0.2400259 0.5709808 0.8093067
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.63034548 0.992 0.008 0.010
## x1 -0.07893218 0.030 0.970 0.071
## x2 0.12308498 0.787 0.213 0.479
##
## Residual standard error: 0.4699 on 39 degrees of freedom
## Multiple R-squared: 0.04076 Adjusted R-squared: -0.008433
## F-statistic: 0.8286 on 2 and 39 degrees of freedom, p-value: 0.4442
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1 x2
## Min -1.95987 -1.28828 -2.89829
## 1stQ -0.53884 -0.37780 -0.89607
## Median 0.04723 0.02143 -0.07872
## Mean 0.04834 0.02077 0.08003
## 3rdQ 0.62551 0.45594 0.72284
## Max 1.98692 1.57047 10.38170
[DESCRIBE/INTERPRET OUTPUT]
Linear regression QAPs test the extent to which independent variables
are affecting the presence or absence of edges.
Using our bonobo
dataset, we can ask: Does older age make an individual more or
less likely to G-G rub with other individuals?
In this
example, our grooming network is the response variable while age is our
predictor variable.
Similar to the multiple regression, we must first create a matrix that shows the rank of senders and receivers. As a reminder, senders are represented by ROWS while receivers are represented by COLUMNS in matrices
nodes <- 7 # number of nodes in the data set
age <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we're interested in
age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create empty matrix to be filled
age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
age_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
for (i in 1:nodes){
age_receiving[,i] <- rep(age[i], nodes)}
age_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our linear regression!
lrqap <- netlogit(bonobo_ggr_net, # response variable is a network object with an unweighted adjacency matrix
list(age_receiving, age_sending)) # list of all predictor variables as network or matrix objects
summary(lrqap)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.5524763 94.8670398 0.971 0.029 0.055
## x1 -0.4903676 0.6124013 0.030 0.970 0.069
## x2 -0.6817886 0.5057117 0.009 0.991 0.026
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
## 10.77041 on 3 degrees of freedom, p-value 0.01303442
## AIC: 53.45396 BIC: 58.66696
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2040994
## (Dn-Dr)/Dn: 0.1849811
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 11
## 1 4 2
##
## Total Fraction Correct: 0.6428571
## Fraction Predicted 1s Correct: 0.3333333
## Fraction Predicted 0s Correct: 0.6944444
## False Negative Rate: 0.8461538
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2
## Min -2.04329 -1.57284 -2.01549
## 1stQ -0.60970 -0.48545 -0.71649
## Median 0.08668 0.05638 -0.06637
## Mean 0.06877 0.04245 -0.04359
## 3rdQ 0.74504 0.62798 0.53998
## Max 2.04528 1.67968 2.02638
In the output above, x1 represents the first variable in the model (age_receiving), and x2 represents the second variable in the model (age_sending). [DESCRIBE/INTERPRET OUTPUT]
[…]